12 research outputs found
Learning non-linear invariants for unsupervised out-of-distribution detection
An important hurdle to overcome before machine learning models can be reliably deployed in practice is identifying when samples are different from those seen during training, as the output for unexpected samples are often confidently incorrect, while not being identifiable as such. This problem is known as out-of-distribution (OOD) detection. A popular approach for the unsupervised OOD case is to reject samples with a high Mahalanobis distance with regards to the mean features of the training data. Recent work showed that the Mahalanobis distance can be thought of as finding the training data invariants, and rejecting OOD samples that violate them. A key limitation to this approach is that it is limited to linear relations only. Here, we present a novel method capable of identifying non-linear invariants in the data. These are learned using a reversible neural network, consisting of alternating rotation and coupling layers. Results on a varied number of tasks show it to be the best method overall, and achieving state-of-the-art results on some of the experiments
Comparison of outlier detection methods on astronomical image data
Among the many challenges posed by the huge data volumes produced by the new
generation of astronomical instruments there is also the search for rare and
peculiar objects. Unsupervised outlier detection algorithms may provide a
viable solution. In this work we compare the performances of six methods: the
Local Outlier Factor, Isolation Forest, k-means clustering, a measure of
novelty, and both a normal and a convolutional autoencoder. These methods were
applied to data extracted from SDSS stripe 82. After discussing the sensitivity
of each method to its own set of hyperparameters, we combine the results from
each method to rank the objects and produce a final list of outliers.Comment: Preprint version of the accepted manuscript to appear in the Volume
"Intelligent Astrophysics" of the series "Emergence, Complexity and
Computation", Book eds. I. Zelinka, D. Baron, M. Brescia, Springer Nature
Switzerland, ISSN: 2194-728
Comparison of Outlier Detection Methods on Astronomical Image Data
Among the many challenges posed by the huge data volumes produced by the new generation of astronomical instruments there is also the search for rare and peculiar objects. Unsupervised outlier detection algorithms may provide a viable solution. In this work we compare the performances of six methods: the Local Outlier Factor, Isolation Forest, k-means clustering, a measure of novelty, and both a normal and a convolutional autoencoder. These methods were applied to data extracted from SDSS stripe 82. After discussing the sensitivity of each method to its own set of hyperparameters, we combine the results from each method to rank the objects and produce a final list of outliers
Stochastic Segmentation with Conditional Categorical Diffusion Models
Semantic segmentation has made significant progress in recent years thanks to
deep neural networks, but the common objective of generating a single
segmentation output that accurately matches the image's content may not be
suitable for safety-critical domains such as medical diagnostics and autonomous
driving. Instead, multiple possible correct segmentation maps may be required
to reflect the true distribution of annotation maps. In this context,
stochastic semantic segmentation methods must learn to predict conditional
distributions of labels given the image, but this is challenging due to the
typically multimodal distributions, high-dimensional output spaces, and limited
annotation data. To address these challenges, we propose a conditional
categorical diffusion model (CCDM) for semantic segmentation based on Denoising
Diffusion Probabilistic Models. Our model is conditioned to the input image,
enabling it to generate multiple segmentation label maps that account for the
aleatoric uncertainty arising from divergent ground truth annotations. Our
experimental results show that CCDM achieves state-of-the-art performance on
LIDC, a stochastic semantic segmentation dataset, and outperforms established
baselines on the classical segmentation dataset Cityscapes.Comment: Code available at
https://github.com/LarsDoorenbos/ccdm-stochastic-segmentatio
Unsupervised out-of-distribution detection for safer robotically-guided retinal microsurgery
Purpose: A fundamental problem in designing safe machine learning systems is
identifying when samples presented to a deployed model differ from those
observed at training time. Detecting so-called out-of-distribution (OoD)
samples is crucial in safety-critical applications such as robotically-guided
retinal microsurgery, where distances between the instrument and the retina are
derived from sequences of 1D images that are acquired by an
instrument-integrated optical coherence tomography (iiOCT) probe.
Methods: This work investigates the feasibility of using an OoD detector to
identify when images from the iiOCT probe are inappropriate for subsequent
machine learning-based distance estimation. We show how a simple OoD detector
based on the Mahalanobis distance can successfully reject corrupted samples
coming from real-world ex-vivo porcine eyes.
Results: Our results demonstrate that the proposed approach can successfully
detect OoD samples and help maintain the performance of the downstream task
within reasonable levels. MahaAD outperformed a supervised approach trained on
the same kind of corruptions and achieved the best performance in detecting OoD
cases from a collection of iiOCT samples with real-world corruptions.
Conclusion: The results indicate that detecting corrupted iiOCT data through
OoD detection is feasible and does not need prior knowledge of possible
corruptions. Consequently, MahaAD could aid in ensuring patient safety during
robotically-guided microsurgery by preventing deployed prediction models from
estimating distances that put the patient at risk.Comment: Accepted at IPCAI 202
Data Invariants to Understand Unsupervised Out-of-Distribution Detection
Unsupervised out-of-distribution (U-OOD) detection has recently attracted much attention due to its importance in mission-critical systems and broader applicability over its supervised counterpart.
Despite this increased attention, U-OOD methods suffer from important shortcomings.
By performing a large-scale evaluation on different benchmarks and image modalities, we show in this work that most popular state-of-the-art methods are unable to consistently outperform a simple anomaly detector based on pre-trained features and the Mahalanobis distance (MahaAD).
A key reason for the inconsistencies of these methods is the lack of a formal description of U-OOD.
Motivated by a simple thought experiment, we propose a characterization of U-OOD based on the invariants of the training dataset.
We show how this characterization is unknowingly embodied in the top-scoring MahaAD method, thereby explaining its quality. Furthermore, our approach can be used to interpret predictions of U-OOD detectors and provides insights into good practices for evaluating future U-OOD methods
SS3D: Unsupervised Out-of-Distribution Detection and Localization for Medical Volumes
We present an extension of the self-supervised outlier detection (SSD) framework to the three-dimensional case. We first apply contrastive learning on a network using a general dataset of two-dimensional slices randomly sampled from all the available training data. This network serves as a latent embedding encoder of the input images. We model the in-distribution latent density as a multivariate Gaussian, fitted to the embeddings of the training slices. At test time, each test sample is scored by summing the Mahalanobis distances from all its slices to the means of the learned Gaussians. While mainly meant as a sample-level method, this approach additionally enables coarse localization, scoring each voxel by the minimum Mahalanobis distance among the slices that contain it. On the sample-level task of the 2021 MICCAI Medical Out-of-Distribution Analysis Challenge, our method ranked second on the challenging abdominal dataset, and fourth overall. Moreover, we show that with pretrained features and the right choice of architecture, a further boost in performance can be gained
Generating astronomical spectra from photometry with conditional diffusion models
A trade-off between speed and information controls our understanding of astronomical objects. Fast-to-acquire photometric observations provide global properties, while costly and time-consuming spectroscopic measurements enable a better understanding of the physics governing their evolution. Here, we tackle this problem by generating galaxy spectra directly from photometry, through which we obtain an estimate of their intricacies from easily acquired images. This is done by using multimodal conditional diffusion models, where the best out of the generated spectra is selected with a contrastive network. Initial experiments on minimally processed SDSS data show promising results
Stochastic Segmentation with Conditional Categorical Diffusion Models
Semantic segmentation has made significant progress in recent years thanks to deep neural networks, but the common objective of generating a single segmentation output that accurately matches the image's content may not be suitable for safety-critical domains such as medical diagnostics and autonomous driving. Instead, multiple possible correct segmentation maps may be required to reflect the true distribution of annotation maps. In this context, stochastic semantic segmentation methods must learn to predict conditional distributions of labels given the image, but this is challenging due to the typically multimodal distributions, high-dimensional output spaces, and limited annotation data. To address these challenges, we propose a conditional categorical diffusion model (CCDM) for semantic segmentation based on Denoising Diffusion Probabilistic Models. Our model is conditioned to the input image, enabling it to generate multiple segmentation label maps that account for the aleatoric uncertainty arising from divergent ground truth annotations. Our experimental results show that CCDM achieves state-of-the-art performance on LIDC, a stochastic semantic segmentation dataset, and outperforms established baselines on the classical segmentation dataset Cityscapes
: A tool for one-shot sky exploration and its application for detection of active galactic nuclei
Context. Modern sky surveys are producing ever larger amounts of observational data, which makes the application of classical approaches for the classification and analysis of objects challenging and time consuming. However, this issue may be significantly mitigated by the application of automatic machine and deep learning methods.
Aims. We propose uliss